Eecient Methods for Dealing with Missing Data in Supervised Learning
نویسنده
چکیده
We present eecient algorithms for dealing with the problem of missing inputs (incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution using Parzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks. For training, we show how the backpropagation step for an incomplete pattern can be approximated by a weighted averaged backpropagation step. The complexity of the solutions for training and recall is independent of the number of missing features. We verify our theoretical results using one classiication and one regression problem.
منابع مشابه
Efficient Methods for Dealing with Missing Data in Supervised Learning
Subutai Ahmad Interval Research Corporation 1801-C Page Mill R<;l. Palo Alto, CA 94304 We present efficient algorithms for dealing with the problem of missing inputs (incomplete feature vectors) during training and recall. Our approach is based on the approximation of the input data distribution using Parzen windows. For recall, we obtain closed form solutions for arbitrary feedforward networks...
متن کاملMissing or Inapplicable: Treatment of Incomplete Continuous-valued Features in Supervised Learning
Real-world data are often riddled with data quality problems such as noise, outliers and missing values, which present significant challenges for supervised learning algorithms to effectively classify them. This paper explores the ill-effects of inapplicable features on the performance of supervised learning algorithms. In particular, we highlight the difference between missing and inapplicable...
متن کاملImputation of Missing Data Using Machine Learning Techniques
A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered. This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data. We have approached the data completion problem using two well-known machine le...
متن کاملMissing Data Imputation for Supervised Learning
This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on ...
متن کاملTechniques for Dealing with Missing Data in Knowledge Discovery Tasks
Information plays a very important role in our life. Advances in many research fields depend on the ability of discovering knowledge in very large data bases. A lot of businesses base their success on the availability of marketing information. This kind of data is usually big, and not always easy to manage. Scientists from different research areas have developed methods to analyze huge amounts ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995